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ABSTRACT 



The invention provides a perceptually-based system for 
pattern retrieval and matching, suitable for use in a wide 
variety of information processing applications. An illustra- 
tive embodiment of the system uses a predetermined 
vocabulary comprising one or more dimensions to extract 
color and texture information from an information signal, 
e.g., an image, selected by a user. The system then generates 
a distance measure characterizing the relationship of the 
selected image to another image stored in a database, by 
applying a grammar, comprising a set of predetermined 
rules, to the color and texture information extracted from the 
selected image and corresponding color and texture infor- 
mation associated with the stored image. The vocabulary 
may include dimensions such as overall color, directionality 
and orientation, regularity and placement, color purity, and 
pattern complexity and heaviness. The rules in the grammar 
may include equal pattern, overall appearance, similar 
pattern, and dominant color and general impression, with 
each of the rules expressed as a logical combination of 
values generated for one or more of the dimensions. The 
distance measure may include separate color and texture 
metrics characterizing the similarity of the respective color 
and texture of the two images being compared. The inven- 
tion is also applicable to other types of information signals, 
such as sequences of video frames. 

18 Claims, 4 Drawing Sheets 
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RETRIEVAL AND MATCHING OF COLOR IEEE Conf. on Image Processing, 1997, where the main 

PATTERNS BASED ON A PREDETERMINED focus is not in finding a best representation, but rather on the 

VOCABULARY AND GRAMMAR relevance feedback that will dynamically adapt multiple 

visual features to different applications and different users. 

FIELD OF THE INVENTION s Hence, although great progress has been made, none of the 

The present invention relates generally to techniques for 63dstin e scarch ^P*™ off ^ a "^Pj 610 solution !° * c 

processing images, video and other types of information ima 6f rem V"* P roblem > an ? mere remam sl 6 nlfi - 

signals, and more particularly to automated systems and drawbacks with the existing techniques which prevent 

devices for retrieving, matching and otherwise manipulating ^eir *** m man y ™portant pracUcal apphcations. 

information signals which include color pattern information. 10 These drawbacks can be attributed to a very limited 

understanding of color patterns compared to other visual 

BACKGROUND OF THE INVENTION phenomena such as color, contrast or even gray-level tex- 

Flexible retrieval and manipulation of image databases ?"«• For example the basic dimensions of color patterns 
and other types of color pattern databases has become an „ h * ve not vel b 5 eD adequately identified a standardized and 
important problem with applications in video editing, photo- 15 effecUve ** of features for Messing their important char- 
journalism, art, fashion, cataloging, retailing, interactive actensUcs does not exist, nor are there rules defining how 
CAD, geographic data processing, etc. Until recently, these featares are to be comb^med. Previous mvesUgattor^ m 
content-based retrieval (CBR) systems have generally 1018 field have concentrated mainly on gray-level natural 
required a user to enter key words to search image and video ,„ te * «*, as described m the above-cited ^H. Tamura et 
databases. Unfortunately, this approach often does not work 20 al. reference, and m A. R. Rao and G L. Ixhse, Towards 
weU, since different people describe what they see or what a texture narnmg system: Identrfying relevant dimensions of 
they search for in different ways, and even the same person texturc " V f °° R f» 3 T 6 > N °- 1 • »■ ^669, 1996. 
might describe the same image differently depending on the Fot sample, the Rao and Lohse reference focused on how 
context in which it will be used. „ textures £ meamngfu hierarchically- 

_ . , - i nm structured categones, identifying relevant features used m 

J£™x a? k h rv? V'a^T? perception of gray-level textures. However, these 

MUSEUM and described in K Hjrata and T Katzo, "Query ^ fafl to *, dress ^ above . noted color pattem 

by visual examp e, Proc of 3 Int. Conf on Extending ^ a need Km ^ s fer m effective framework for 

Database Technology, performs retrieval entirely based on ^ . ^ ^ 

edge features. A commercial content-based image search 30 

engine with profound effects on later systems was QBIC, SUMMARY OF THE INVENTION 
described in W. Niblack et al. "The QBIC project: Quering 

images by content using color, texture and shape/' Proc. The invention provides a perceptually-based system for 

SPIE Storage and Retrieval for Image and Mdeo Data Bases, pattern retrieval and matching, suitable for use in a wide 

February, 1994. As color representation, this system uses a 35 variety of information processing apphcations. The system 

k-element histogram and average of (R,G,B), (Y,i,q), and is based in part on a vocabulary, i.e., a set of perceptual 

(L,a,b) coordinates, whereas for the description of texture it criteria used in comparison between color patterns associ- 

implements Tamura's feature set, as described in H. Tamura ated with information signals, and a grammar, i.e., a set of 

et al., "Textural features corresponding to visual rules governing the use of these criteria in similarity judg- 

perception," IEEE Transactions on Systems, Man and 4Q ment. The system utilizes the vocabulary to extract percep- 

Cybernetics, Vol. 8, pp. 460-473, 1982. tual features of patterns from images or other types of 

In a similar fashion, color, texture and shape are supported information signals, and then performs comparisons 

as a set of interactive tools for browsing and searching between the patterns using the grammar niles. The invention 

images in the Photobook system developed at the MIT ^0 provides new color and texture distance metrics that 

Media Lab, as described in A. Penttand et al., "Photobook: 45 correlate well with human performance in judging pattern 

Content -based manipulation of image databases," Interna- similarity. 

tional Journal of Computer Vision, 1996. In addition to An illustrative embodiment of a perceptually-based sys- 

providing these elementary features, systems such as tern in accordance with the invention uses a predetermined 

VisualSeek, described in J. R. Smith and S. Chang, "Visu- vocabulary comprising one or more dimensions to extract 

alSeek: A fully automated content-based query system," 50 color and texture information from an information signal, 

Proc. ACM Multimedia 96, 1996, Netra, described in W. Y. e.g., an image, selected by a user. The system then generates 

Ma and B. S. Manjunath, "Netra: A toolbox for navigating a distance measure characterizing the relationship of the 

large image databases," Proc. IEEE Int. Conf. on Image selected image to another image stored in a database, by 

Processing, 1997, and Virage, described in A. Gupta, and R. applying a grammar, comprising a set of predetermined 

Jain, "Visual information retrieval," Communications of the 55 rules, to the color and texture information extracted from the 

ACM, Vol. 40, No. 5, 1997, each support queries based on selected image and corresponding color and texture infor- 

spatial relationships and color layout. Moreover, in the mation associated with the stored image. For example, the 

above-noted Virage system, the user can select a combina- system may receive the selected image in the form of an 

tion of implemented features by adjusting the weights input image A submitted in conjunction with a query from 

according to his or her own "perception." This paradigm is go *h e user * system then measures dimensions DIM^A) 

also supported in RetrievafWare search engine described in from the vocabulary, for i«l, . . . , N, and for each image B 

J. Do we, "Content based retrieval in multimedia imaging," from an image database, applies rules R ( - from the grammar 

Proc. SPIE Storage and Retrieval for Image and Video to obtain corresponding distance measures dist,(A, B), 

Databases, 1993. where dist^A, B) is the distance between the images A and 

A different approach to similarity modeling is proposed in 65 B according to the rule i. 

the MARS system, described in Y. Ruiet al., "Content-based In accordance with the invention, the vocabulary may 

image retrieval with relevance feed-back in Mars," Proc. include dimensions such as overall color, directionality and 
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orientation, regularity and placement, color purity, and pat- Sage Publications, London, 1978. MDS is designed to 
tern complexity and heaviness. The rules in the grammar analyze distance-like data called similarity data; that is, data 
may include equal pattern, overall appearance, similar indicating the degree of similarity between two items, 
pattern, and dominant color and general impression, with Traditionally, similarity data is obtained via subjective mea- 
each of the rules expressed as a logical combination of 5 surement. It is acquired by asking people to rank similarity 
values generated for one or more of the dimensions. The of pairs of objects, i.e., stimuli, on a scale. The obtained 
distance measure may include separate color and texture similarity value connecting stimulus i to stimulus j is 
metrics characterizing the similarity of the respective color denoted by 8 iy . Similarity values are arranged in a similarity 
and texture of the two patterns being compared. matrix A, usually by averaging 6, y obtained from all mea- 

A major advantage of a pattern retrieval and matching ™ surements. The aim of MDS is to place each stimulus from 
system in accordance with the invention is that it eliminates the input set into an n-dimensional stimulus space. The 
the need for selecting the visual primitives for image dimensionality n of the space is also determined in the 
retrieval and expecting the user to assign weights to them, as experiment. The points x-[x (1 . . . x^] representing each 
required in most current systems. Furthermore, the invention stimulus are arranged so that the Euclidean distances d iy 
is suitable for use in a wide variety of pattern domains, ^ between each pair of points in the stimulus space match as 
including art, photography, digital museums, architecture, closel y as possible the subjective similarities 5, y between 
interior design, and fashion. corresponding pairs of stimuli. Types of MDS suitable for 

use in conjunction with the invention include classical MDS 
BRIEF DESCRIPTION OF THE DRAWINGS (CMDS) and weighted MDS (WMDS). Additional details 

^ , . . ,.20 regarding these and other types of MDS may be found in the 

1 shows a Portion of pattern retrieval and matching & £ ov ^ cd j. Rruskal and M. Wish reference, 
system in accordance with the mvention. rT _ . . 

J HCAis described in greater detail in R. Duda and P. Hart, 

FIG. 2 shows a more detailed view of a color represen- „ pattern classification ^ analysis/' John Wiley & 

tation and modeling process implemented m a feature Sons, New York, N.Y, 1973. Given a similarity matrix, HCA 
extraction element in the FIG. 1 system. 25 organizes a set of stimuli into ^flar Therefore, HCA 

FIG. 3 shows a more detailed view of a texture represen- can DC used to determine a set of rules and the rule hierarchy 
tation and modeling process implemented in the feature f or judging similarity in pattern matching. This method 
extraction element in the FIG. 1 system. star ts from the stimulus set to build a tree. Before the 

FIG. 4 shows an exemplary communication system appli- procedure begins, all stimuli are considered as separate 
cation of the pattern retrieval and matching system of FIG. 30 clusters, hence there are as many clusters as there are ranked 
1. stimuli. The tree is formed by successively joining the most 

FIG. 5 is a flow diagram illustrating the operation of the similar pairs of stimuli into new clusters. At every step, 
pattern retrieval and matching system in the communication either individual stimulus is added to the existing clusters, or 
system of FIG. 4. two exiting clusters are merged. The grouping continues 

35 until all stimuli are members of a single cluster. The manner 
DETAILED DESCRIPTION OF THE in which the similarity matrix is updated at each stage of the 

INVENTION tree is determined by the joining algorithm. There are many 

^ . ., . i - . c possible criteria for deciding how to merge clusters. Some of 

The mvention provides a vocabulary, i.e., a set of per- f L . . i 4 , , ° . ° . . , . 

. ... * • - j ■ • *i •/ c i the simplest methods use a nearest neighbor technique, 

ceplual criteria used m judging : similarity of color patterns, w ^ £ first ^ q m th ose tnat have lhe 

their relative importance and relationships, as well a t ,. t , / A . . 

, - , • 1 c*u smallest distance between them. Another commonly used 

grammar, i.e., a hierarchy of rules governing the use or the , . . • t 4l _ . . t . • u «u 

6 j , i , & < . . , technique is the furthest neighbor technique where the 

vocabulary in similarity judgment It has been determined ^ beween clus(ers e is obtained ^ ^ 

flat these attributes are apphcable to a broad range of centroid m ^ 

textures, from simple patterns to complex, high-level visual 45 be| ^ een ^ ^ ^ct 

texture phenomena. The vocabulary and grammar are uti- ^ ^ ^ of a( 

feed in a pattern matching and retrieval system that, in an each ^ oQ ^ distance me ^ 

illustrative embodiment, receives one or more information , r r 

. , , ,. 4 , t f tance measures can result m different clustermg solutions for 

signals as input and depending on the type of query, method and o(her HCA 

produces a set of choices modeled on human behavior in rn . , ° . , A . , , „ „ , , 

r ^ A 4 „. - • i„ j 50 mques are described m detail in the above-cited R. Duda and 

pattern matching. The term information signal as used Hart reference 

herein is intended to include an image, a sequence of video ' . * ... 

frames, or any other type of information signal that may be Clustering techniques are often used in combination with 

characterized as including a pattern. MDS > to cUnf ? u ** ° b ^ med ^ en « 0DS - However ^ 

same way as with the labeling of the dimensions in the MDS 

1.0 Vocabulary and Grammar of Color Patterns 55 algorithm, interpretation of the clusters is usually done 

„ j subjectively and strongly depends on the quality of the data. 

The exemplary vocabulary and grammar to be described 

herein have been determined through experimentation, using 1.1 Vocabulary: Most Important Dimensions of 

multidimensional scaling and hierarchical clustering tech- Color Patterns 

niques to interpret the experimental data. Multidimensional 60 The above-noted vocabulary will now be described in 

scaling (MDS) was applied to determine the most important greater detail. Experiments were performed to determine 

dimensions of pattern similarity, while hierarchical cluster subjective impressions of 20 different patterns from interior 

analysis (HCA) was used to understand how people combine design catalogs. There were 28 subjects taking part in the 

these dimensions when comparing color patterns. experiment, each presented with all 190 possible pairs of 

MDS is a well-known set of techniques that uncover the 65 patterns. For each pair, the subjects were asked to rate the 

hidden structures in data, and is described in greater detail degree of overall similarity on a scale rating from 0 for "very 

in J. Kruskal and M. Wish, "Multidimensional scaling " different" to 100 for "very similar." There were no instruc- 
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tions concerning the characteristics on which these similar- 
ity judgments were to be made, since this was what the 
experiment was designed to discover. The order of presen- 
tation for each subject was different and was determined 
through the use of a random number generator. 

The first step in the data analysis was to arrange subjects' 
ratings into a similarity matrix A to be an input to a 
two-dimensional and three-dimensional CMDS procedure. 
Also, a WMDS procedure was applied to the set of 28 
individual similarity matrices. WMDS was performed in 
two, three, four, five and six dimensions. The WMDS error 
for the two-dimensional solution was 0.31, indicating that a 
higher-dimensional solution was necessary, i.e., that the 
error was still substantial. The WMDS errors for the three-, 
four-, five- and six-dimensional configurations were 0.26, 
0.20, 0.18 and 0.16, respectively. The analysis was Rot 
extended beyond six dimensions since further increases did 
not result in a noticeable decrease of the error. 

The two-dimensional CMDS procedure indicated that the 
important dimensions were: 1) presence/absence of a domi- 
nant color, also referred to herein as "overall color," and 2) 
color purity. It is interesting that both dimensions are purely 
color based, indicating that, at the coarsest level of 
judgment, people primarily use color to judge similarity. As 
will be seen below, these dimensions remained in all solu- 
tions. Moreover, the two-dimensional configuration strongly 
resembles one of the perpendicular projections in the three-, 
four- and five-dimensional solutions. The same holds for all 
three dimensions from the three-dimensional solution, indi- 
cating that these features could be the most general in human 
perception. For both CMDS and WMDS, the same three 
dimensions emerged from the three-dimensional configura- 
tions: 1) overall color, 2) color purity, and 3) regularity and 
placement. The four-dimensional WMDS solution revealed 
the following dimensions: 1) overall color, 2) color purity, 3) 
regularity and placement, and 4) directionality. The five- 
dimensional WMDS solution came with the same four 
dominant characteristics with the addition of a dimension 
that is referred to herein as "pattern heaviness." This fifth 
dimension did not improve the goodness-of-fit significantly, 
since it changed the WMDS error from 0.20 (for four 
dimensions) to 0.18 (for five dimensions). Hence, as a result 
of the above-described experiment, the following five 
important similarity criteria were determined: 

DIMENSION 1 — overall color, which can be described in 
terms of the presence/absence of a dominant color. At the 
negative end of this dimension are patterns with an overall 
impression of a single dominant color. This impression is 
created mostly because the percentage of one color is truly 
dominant. However, a multicolored image can also create an 
impression of dominant color. This happens when all the 
colors within the multicolored image are similar, having 
similar hues but different intensities or saturation. At the 
positive end of this dimension are patterns where no single 
color is perceived as dominant. 

DIMENSION 2— directionality and orientation. This 
dimension represents a dominant orientation in the edge 
distribution, or a dominant direction in the repetition of the 
structural element. The lowest values along this dimension 
have patterns with a single dominant orientation, such as 
stripes and then checkers. Midvalues are assigned to patterns 
with a noticeable but not dominant orientation, followed by 
the patterns where a repetition of the structural element is 
performed along two directions. Finally, completely nonori- 
ented patterns and patterns with uniform distribution of 
edges or nondirectional placement of the structural element 
are at the positive end of this dimension. 
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DIMENSION 3 — regularity and placement. This dimen- 
sion describes the regularity in the placement of the struc- 
tural element, its repetition and uniformity. At the negative 
end of this dimension are regular, uniform and repetitive 

5 patterns (with repetition completely determined by a certain 
set of placement rules), whereas at the opposite end are 
nonrepetitive or nonuniform patterns. 

DIMENSION A — color purity. This dimension divides 
patterns according to the degree of their colorfulness. At the 

io negative end are pale patterns, patterns with unsaturated 
overtones, and patterns with dominant "sandy" or "earthy" 
colors. At the positive end are patterns with very saturated 
and very pure colors. Hence, this dimension is also referred 
to as overall chroma or overall saturation within an image. 

15 DIMENSION 5 — pattern complexity and heaviness. This 
dimension showed only in the last, five-dimensional con- 
figuration. Also, as will be shown below, it is not used in 
judging similarity until the very last level of comparison. For 
that reason it is also referred to herein as "general impres- 

20 sion." At one end of this dimension are patterns that are 
perceived as "light" and "soft," while at the other end are 
patterns described by subjects as "heavy," "busy" and 
"sharp." 

25 1.2 Grammar: Rules For Judging Similarity 

A grammar, i.e., a set of rules governing use of the 
above- described dimensions, was then determined. HCA 
was used to order groups of patterns according to the.degree 
of similarity, as perceived by subjects, and to derive a list of 

30 similarity rules and the sequence of their application. For 
example, it was observed that the very first clusters were 
composed of pairs of equal patterns. These were followed by 
the clusters of patterns with similar color and dominant 
orientation. The HCA analysis led to the following rules: 

35 RULE 1 — equal pattern. Regardless of color, two textures 
with exactly the same pattern are always judged to be the 
most similar. Hence, this rule uses Dimension 2 
(directionality) and Dimension 3 (pattern regularity and 
placement). 

40 

RULE 2 — overall appearance. Rule 2 uses the combina- 
tion of Dimension 1 (dominant color) and Dimension 2 
(directionality). Two patterns that have similar values in 
both dimensions are also perceived as similar. 

45 RULE 3 — similar pattern. Rule 3 concerns either Dimen- 
sion 2 (directionality) or Dimension 3 (pattern regularity and 
placement). Hence, two patterns which are dominant along 
the same direction(s) are seen as similar, regardless of their 
color. In the same manner, patterns with the same placement 

50 or repetition of the structural element are seen as similar, 
even if the structural element is not exactly the same. 

RULE A — dominant color. Two multicolored patterns are 
perceived as similar if they possess the same color distri- 
butions regardless of their content, directionality, placement 

55 or repetition of a structural element. This also holds for 
patterns that have the same dominant or overall color. 
Hence, this rule involves only Dimension 1 (dominant 
color). 

RULE 5 — general impression. Rule 5 concerns Dimen- 
60 sions 4 and 5, and divides patterns into "dim", "smooth", 
"earthy", "romantic" or "pale" patterns (at one end of the 
corresponding dimension) as opposed to "bold", "bright", 
"strong", "pure", "sharp", "abstract" or "heavy" patterns (at 
the opposite end). This rule represents the complex combi- 
65 nation of color, contrast, saturation and spatial frequency, 
and therefore applies to patterns at the highest, abstract level 
of understanding. 
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The above set of rules represents an illustrative embodi- Orlando, 1970. Therefore, the feature extraction component 

meet of a basic grammar of pattern matching in accordance 12 simulates a similar mechanism, i.e., it decomposes an 

with the invention. It should be noted that, in a given image map into luminance and chrominance components in 

application, each rule can be expressed as a logical the initial stages, and models pattern information later, as 

expression, e.g., a logical combination, using operators such 5 described in detail below. 

as OR, AND, XOR, NOT, etc., of the pattern values along As in the human visual system, a first approximation is 

the dimensions involved in the rule. For example, consider that each of these components is processed through separate 

a cluster composed of Patterns X and Y that have similar pathways While luminance and chrominance components 

overall color and dominant orientation. The values associ- are used for the extraction of color-based information, the 

ated with Patterns X and Y along both Dimensions 1 and 2 io achromatic pattern component is used for the extraction of 

are very close. Consequently, X and Y are perceived as P^ 1 ? texture-based information. However, one can be 

similar according to the Rule 2, which may be expressed in ™>* P™** b y accounting for rcsjduid mteracuons along 

the followin wav pathways, as described in R. L. DeValois and K. K. 

e o o ing y. DeValois, "Spatial Vision " New York: Oxford University 

(DIM^ similar to DIM^) and (DIM^ similar to 15 Press, 1990. The invention accomplishes this by extracting 

DtM 2 Q0). the achromatic pattern component from the color 

distribution, instead of using the luminance signal as in 

Of course, numerous other logical expressions involving the previous models. Moreover, as will be described below, the 

values of particular patterns along a given set of dimensions ^ CT ^ C0 \ 0T distribution is estimated through the use of a 

may be generated in accordance with the invention. specially-designed perceptual codebook allowing the inter- 

„ , „ action between the luminance and chrominance compo- 

2.0 Overview of the System neQts 

An illustrative embodiment of an exemplary pattern The feature extraction component 12 extracts features by 
retrieval and matching system in accordance with the inven- combining the following three major domains: a) a nonori- 
tion will now be described. The system utilizes the above- M ented luminance domain represented by the luminance corn- 
described basic vocabulary V of color patterns consisting of ponent of an image, b) an oriented luminance domain 
Dimensions 1 to 5: V={DIM 1? . . . , DIM 5 }, and the grammer represented by the achromatic pattern map, and c) a non- 
G, i.e., the rules governing the use of the dimensions from oriented color domain represented by the chrominance com- 
the vocabulary V: G«{R lt R 2 , R 3 , R 4 , R 5 }. The illustrative ponent. The first two domains are essentially "color blind," 
embodiment of the system will, given an input image A and 30 whereas the third domain caries only the chromatic infor- 
a query Q: measure the dimensions DIM/A) from the mation. Additional details regarding these domains can be 
vocabulary, for i=l, . . . , 5, and for each image B from an found in, e.g., M. S. Livingstone and D. H. Hubel, "Segre- 
image database, apply rules Rj through R 5 from G and gation of form, color, movement and depth: Anatomy, physi- 
obtain corresponding distance measures dist^A, B), . . . , ology and perception," Science, Vol. 240, pp. 740-749, 
dist 5 (A, B), where dist,{A, B) is the distance between the 35 1988. The domains have been experimentally verified in 
images A and B according to the rule i. perceptual computational models for segregation of color 

FIG. 1 shows a block diagram of a pattern retrieval and textures, as described in T. V. Papathomas et al, "A human 
matching system 10. The system 10 includes a feature vision based computational model for chromatic texture 
extraction component 12, which measures the dimensions segregation," IEEE Transactions on Systems, Man and 
from vocabulary V, and a similarity measurement compo- 4 o Cybernetics— Part B: Cybernetics, Vol. 27, No. 3, June 
nent 14, in which similar patterns are found using the rules 1997. In accordance with the invention, purely color-based 
from the grammar G. The feature extraction component 12 dimensions (1 and 4) are extracted in the nononented 
is designed to extract Dimensions 1 to 4 of pattern similarity. domains and are measured using the color feature vector. 
Dimension 5 (pattern complexity and heaviness) is not Texture-based dimensions (2 and 3) are extracted in the 
implemented in this illustrative embodiment, since experi- 4 5 oriented luminance domain, through the scale-orientation 
ments have shown that people generally use this criterion processing of the achromatic pattern map. 
only at a higher level of judgment, e.g., while comparing The feature extraction component 12 as shown in FIG. 1 
groups of textures. The similarity measurement component includes processing blocks 20, 22, 24, 26 and 28. Image 
14 in this embodiment performs a judgment of similarity decomposition block 20 transforms an input image into the 
according to Rules 1, 2, 3 and 4 from G. Rule 5 is not 50 Lab color space and decomposes it into luminance L and 
supported in the illustrative embodiment, since it is only chrominance C=(a,b) components. Estimation of color dis- 
used in combination with Dimension 5 at a higher level of tribution block 22 uses both L and C maps for color 
pattern matching, e.g., subdividing a group of patterns into distribution estimation and extraction of color features, i.e., 
romantic, abstract, geometric, bold, etc. performs feature extraction along the color-based Dimen- 

It is important to note that the feature extraction compo- 55 sions 1 and 4. Pattern map generation block 24 uses color 

nent 12 is developed in accordance with a number of features extracted in block 22 to build the achromatic pattern 

assumptions derived from psychophysical properties of the map. Texture primitive extraction and estimation blocks 26 

human visual system and conclusions extracted from the and 28 use the achromatic pattern map to estimate the spatial 

above-noted experiment. For example, it is assumed that the distribution of texture primitives, i.e., to perform feature 

overall perception of color patterns is formed through the 60 extraction along texture-based Dimensions 2 and 3. 

interaction of luminance component L, chrominance com- The similarity measurement component 14 finds similar 

ponent C and achromatic pattern component AP. The lumi- patterns using the rules from the grammar G. The similarity 

nance and chrominance components approximate signal measurement component 14 accesses an image database 30, 

representation in the early visual cortical areas while the and includes a similarity judging block 32. Given an input 
achromatic pattern component approximates signal repre- 65 image A, which may be submitted or selected as part of a 

sentation formed at higher processing levels, as described in user query Q, for a designated set of the images in the 
T. N. Cornsweet, "Visual perception," Academic Press, database 30, rules Rj through R 4 are applied and corre- 
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sponding distance measures are computed. Then, depending 
on the query Q, a set of best matches is found. 

3.0 Feature Extraction Based on Color Information 

The feature extraction based oq color information will 
now be described in greater detail with reference to FIG. 2. 
FIG. 2 shows the processing of color information, as dis- 
tinguished from texture information, in the system 10 of 
FIG. 1. Since color representation is used in the FIG. 1 
system both for the extraction of color-related dimensions 
(color features), and for the construction of the achromatic 
pattern map (used later in texture processing), the feature 
extraction component 12 generates a compact, perceptually- 
based color representation. As shown in FIG. 2, this repre- 
sentation is generated and processed using processing blocks 
40, 42, 44 and 46. In block 40, the input image is trans- 
formed into the Lab color space. This block corresponds to 
the image decomposition block 20 of FIG. 1. In block 42, 
which may be viewed as an element of block 22 of FIG. 1, 
a color distribution is determined using a vector 
quantization-based histogram technique which involves 
reading a color codebook Block 44, which also may be 
viewed as an element of block 22, extracts significant color 
features from the histogram generated in block 42. Block 46, 
which may be viewed as an element of the similarity judging 
block 32, then performs a color distance calculation to 
determine the perceptual similarity between the determined 
color distribution and the corresponding distribution of an 
image from the database 30. 

3.1 Image Conversion 

The conversion of the input image from RGB to Lab color 
space in block 40 of FIG. 2 will now be described in greater 
detail. An important decision to be made in deriving a color 
feature representation is which color space to use. In order 
to produce a system that performs in accordance with human 
perception, a representation based on human color matching 
may be used. CIE Lab is such a color space, and is described 
in G. Wyszecki and W. S. Stiles, "Color science: Concepts 
and methods, quantitative data and formulae," John Wiley 
and Sons, New York, 1982. The Lab color space was 
designed so that inter-color distances computed using the 
L 2 -norm correspond to subjective color matching data. This 
representation is obtained from an RGB representation (or 
any other linear color representation such as YIQ, YUV, etc.) 
by first linearizing the input data, i.e., removing gamma 
correction. Next, the data is transformed into the XYZ color 
space using a linear operator. In the XYZ space, the data is 
normalized with respect to the illumination white point, and 
then converted to the Lab representation via a nonlinear 
transform. Additional details on this conversion process and 
the design of the Lab color space may be found in the 
above-cited G. Wyszecki and W. S. Stiles reference. 

One potential difficulty with this approach is that for most 
images, the white point is unknown. This problem is avoided 
in the illustrative embodiment by using exclusively the D65 
white point, which corresponds "outdoor daylight" illumi- 
nation. As long as all of the images are taken under the same 
lighting conditions, this is not a problem. However, its use 
for images taken under other lighting conditions can cause 
some shift in the estimated color distribution. In general, 
these shifts are relatively small and the dominant color 
representation, to be described below, appears to be able to 
accommodate the inaccuracies introduced by the fixed white 
point assumption. It should be noted that images taken under 
strongly colored lighting will generally not be represented 
correctly. 
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After determining a perceptually meaningful color repre- 
sentation for the L 2 distance metric, the next step is to 
estimate the color distribution in the input image by com- 
puting a histogram of the input color data. This requires 
specifying a set of bin centers and decision boundaries. 
Since linear color spaces (such as RGB) can be approxi- 
mated by 3D cubes, bin centers can be computed by per- 
forming separable, equidistant disctetizations along each of 
the coordinate axes. Unfortunately, by going to the nonlinear 
Lab color space, the volume of all possible colors distorts 
from a cube to an irregular cone. Consequently, there is no 
simple discretization that can be applied to this volume. 

3.2 Histogram Design 

To estimate color distributions in the Lab space, for the 
volume which represents valid colors, the set of bin centers 
and decision boundaries which minimize some error crite- 
rion are determined. In the Lab color system, L 2 -nonn 
corresponds to perceptual similarity, thus representing the 
optimal distance metric for that space. Therefore, to obtain 
an optimal set of bin centers and decision boundaries, one 
attempts to find Lab coordinates of N bin centers so that the 
overall mean-square classification error is minimized. Since 
this is the underlying problem in vector quantization (VQ), 
the LGB vector quantization algorithm, described in A 
Gersho and R. M. Gray, "Vector quantization and signal 
processing," Kluwer Academic Publishers, Boston, 1992, 
may be used to obtain a set of codebooks which optimally 
represent the valid colors in the Lab space. 

In any VQ design, the training data can have a large effect 
on the final result. A commonly used VQ design approach 
selects training images which are: a) either representative of 
a given problem so the codebook is optimally designed for 
that particular application, or b) span enough of the input 
space, so the resulting codebook can be used in many 
different applications. The following problem occurs with 
both of these approaches: in order to obtain an accurate 
estimation for the distribution of all possible colors, a large, 
number of training images is required. This results in a 
computationally expensive and possibly intractable design 
problem. To overcome this problem, the present invention 
takes a different approach. Since we need to deal with an 
arbitrary input, we will assume that every valid color is 
equi-probable. Hence, a synthetic set of training data can be 
generated by uniformly quantizing the XYZ space. This data 
was transformed into the Lab space and then used as input 
to a standard VQ design algorithm. This resulted in a set of 
codebooks ranging in size from 16 to 512 colors. 

A potential drawback of these codebooks is that they are 
designed as a global representation of the entire color space 
and consequently, there is no structure to the bin centers. In 
an embodiment of the invention which allows a user to 
interact with the retrieval process, it is desirable for the color 
representation to provide manipulation with colors in a 
"human-friendly" manner. To simulate human performance 
in color perception, a certain amount of structure on the 
relationships between the L, a, and b components must be 
introduced. One possible way to accomplish this is by 
separating the luminance L, from the. chrominance (a,b) 
components. In the illustrative embodiment, a one- 
dimensional quantization is first applied on luminance val- 
ues of the training data, e.g., using a Lloyd -Max quantizer. 
Then, after partitioning the training data into slices of similar 
luminance, a separate chrominance codebook is designed for 
each slice by applying the LBG algorithm to the appropriate 
(a,b) components. 

This color representation better mimics human perception 
and allows the formulation of functional queries such as 
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looking for "same but lighter color," "paler" "contrasting/' 
etc. For example, the formulation of a query vector to search 
for a "lighter" color can be accomplished through the 
following steps: 1) extract the luminance Lq and the (a G , b c ) 
pair for the query color, 2) find the codebook for a higher 
luminance level L>L G , 3) in this codebook, find the cell 
which corresponds to the (a,b) entry which is the closest to 
(*Q> ^g) ^ L 2 sense, and 4) retrieve all images having 
(L>a,b) as a dominant color. Moreover, starting from the 
relationship between L, a, and b values for a particular color, 
and its hue H and saturation S, 



t> . 

H = arc tan-, S = vV + b 2 , 
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similar procedures can be applied to satisfy user queries 
such as "paler color," "bolder color," "contrasting color," 
etc. Finally, in applications in which the search is performed 
between different databases or when the query image is 
supplied by the user, separation of luminance and chromi- 
nance allows for elimination of the unequal luminance 
condition. Since the chrominance components contain the 
information about the type of color regardless of the inten- 
sity value, color features can be extracted only in the 
chrominance domain C(i,j)={a(i,j),b(i,j)}, for the corre- 
sponding luminance level, thus allowing for comparison 
between images of different quality. 

33 Color Feature Extraction 

Color histogram representations based on color code- 
books have been widely used as a feature vector in image 
segmentation and retrieval, as described in, e.g., M. loka, "A 
method of defining the similarity of images on the basis of 
color information," Technical Report RT-0030, IBM 
Research, Tokyo Research Laboratory, November 1989, and 
M. Swain and D. Ballard, "Color indexing/' International 
Journal of Computer Vision, Vol. 7, No. 1, 1991. Although 
good results have been reported, a feature set based solely on 
the image histogram may not provide a reliable representa- 
tion for pattern matching and retrieval. This is due to the fact 
that most patterns are perceived as combinations of a few 
dominant colors. For that reason, the illustrative embodi- 
ment of the invention utilizes color features and associated 
distance measures comprising the subset of colors which 
best represent an image, augmented by the area percentage 
in which each of these colors occur. 

One implementation of the system 10 of FIG. 1 uses a 
codebook with N»71 colors denoted by C^-IC^C^ . . . , 
C^} where each color C.-jL^a^b,-} is a three-dimensional 
Lab vector. As the first step in the feature extraction proce- 
dure (before histogram calculation), the input image is 
convolved with a B-spline smoothing kernel. This is done to 
refine contours of texture primitives and foreground regions, 
while eliminating most of the background noise. The 
B-spline kernel is used since it provides an optimal repre- 
sentation of a signal in the L 2 sense, hence minimizing the 
perceptual error, as described in M. Unser et al., "Enlarge- 
ment or reduction of digital images with minimum loss of 
information," IEEE Trans. Image Processing, MA. 4, pp. 
247-257, March 1995. The second step (after the histogram 
of an image is generated) involves extraction of dominant 
colors to find colors from the codebook that adequately 
describe a given texture pattern. This was implemented by 
sequentially increasing the number of colors until all colors 
covering more than 3% of the image area have been 
extracted. The remaining pixels were represented with their 



closest matches (in an L 2 sense) from the extracted dominant 
colors. Finally, the percentage of each dominant color was 
calculated and the color feature vectors were obtained as 

where i y is the index in the codebook, p ; is the corresponding 
percentage and N is the number of dominant colors in the 
image. Another similar representation has been successfully 
used in image retrieval, as described in W. Y. Ma et al., 
"Tools for texture/color base search of images," Proc. of 
SPIE, Vol. 3016, 1997. 

The above-described feature extraction of the present 
invention has several advantages. For example, it provides 
an optimal representation of the original color content by 
minimizing the MSE introduced when using a small number 
of colors. Then, by exploiting the fact that the human eye 
cannot perceive a large number of colors at the same time, 
nor is it able to distinguish close colors well, a very compact 
feature representation is used. This greatly reduces the size 
of the features needed for storage and indexing. 
Furthermore, because of the codebook used, this represen- 
tation facilitates queries containing an overall impression of 
patterns expressed in a natural way, such as "find me all 
blue-yellow fabrics/' "find me the same color, but a bit 
lighter," etc. Finally, in addition to storing the values of the 
dominant colors and their percentages, the system also 
stores the actual number of dominant colors. This informa- 
tion is useful in addressing the more complex dimensions of 
pattern similarities, e.g., searching for simple and single 
colored patterns, versus heavy, multicolored ones. 

3.4 Color Metric 

The color features described above, represented as color 
and area pairs, allow the definition of a color metric that 
closely matches human perception. The idea is that the 
similarity between two images in terms of color composition 
should be measured by a combination of color and area 
differences. Given two images, a query image A and a target 
40 image B, with N^ and N^ dominant colors, and feature 
vectors 4(AM(i*,p a )|ae[l,NJ} and f c (B)-{(i b , Pb )\b^h 
N B ]}, respectively, the similarity between these two images 
is first defined in terms of a single dominant color. Suppose 
that i is the dominant color in image A. Then, the similarity 
between A and B is measured in terms of that color using the 
minimum of distance measures between the color element (i, 
p) and the set of color elements {(i^^^b^l,^]}: 
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where 



55 



W* PX <fc. Pb)) = \p - Pb\ + ^{L-U? +&-b b )K 

Once the distance d(i,B) has been calculated, besides its 
value we also use its argument to store the color value from 
B that, for a particular color i from A, minimizes d(i,B)- We 
denote this color value by k(i, B) as: 

Note that the distance between two color/area pairs is 
defined as the sum of the distance in terms of the area 
percentage and the distaace in the Lab color space, both 
within the range [0,1]. The above-cited W. Y. Ma et al. 
reference used a different definition where the overall dis- 
tance is the product of these two components. That definition 
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has the drawback that when either component distance is 
very small the remaining component becomes irrelevant. 
Consider the extreme case, when the color distance between 
two color/area pairs is 0. This is not unusual since the color 
space has been heavily quantized. Then, even if the differ- 
ence between the two area percentages is very large, the 
overall distance is 0, yielding a measure that does not match 
human perception. The illustrative embodiment of the 
invention provides a simple and effective remedy to that 



consider a pair of textures in which the values in the 
luminance map are much higher for one of the textures, 
hence the edge amplitudes, and edge distributions are dif- 
ferent for the two corresponding images. Moreover, the 
dominant colors are not close, which makes the classifica- 
tion of these two patterns as similar (either using luminance, 
chrominance or color features) extremely difficult. However, 
in the above-described model, the way that luminance and 
chrominance are coupled into a single pattern map guaran- 



problem, namely, it guarantees that both color and area 10 tees that both textures will have identical achromatic pattern 
components contribute to the perception of color similarity. maps, leading to almost identical texture feature vectors. 

Given the distance between two images in terms of one The objective of edge and orientation processing in blocks 
dominant color as defined above, the distance in terms of 51, 52 and 54 is to extract information about the pattern 
overall color composition is defined as the sum over all contours from the achromatic pattern map. Instead of apply- 
dominant colors from both images, in the following way: 1) 15 ing a bank of oriented filters, as in previous models, the 
for image A, Va^l,N A ] find k A (i a fi) and the corresponding illustrative embodiment of the present invention computes 
distance d(i^B), 2) repeat this procedure for all dominant polar edge maps and uses them to extract distribution of 
colors in B, that is, Vb^l,N £ ] find k^i^B) and d(i bt A), and edges along different directions. Ibis approach makes it 
3) calculate the overall distance as possible to obtain the edge distribution for an arbitrary 

20 orientation with low computational cost. It also introduced 
certain flexibility in the extraction of texture features since, 
if necessary, the orientation selectivity can be enhanced by 
choosing an arbitrary number of orientations. In the illus- 
trative system 10, edge-amplitude and edge-angle maps, 
Other types of distance calculations could also be used to M calculated at each point , are used. Edge maps were 



dut{A,b) 



generate a color metric in accordance with the invention. 

4.0 Feature Extraction Based on Texture 
Information 

The feature extraction based on texture information will 30 
now be described in greater detail with reference to FIG. 3. 
FIG. 3 shows the processing of texture information, as 
distinguished from color information, in the system 10 of 
FIG. 1. As shown in FIG. 3, this representation is generated 
and processed using processing blocks 50, 51, 52, 54, 56 and 35 
58. In block 50, the achromatic pattern map is generated 
from the color feature vector, after spatial smoothing to 
refine texture primitives and remove background noise. This 
block corresponds to the pattern map generation block 24 of 
FIG. 1. In block 51, which may be viewed as an element of 40 
block 26 of FIG. 1, the edge map is built from the achromatic 
pattern map. Block 52 applies a nonlinear mechanism to 
suppress nontextured edges. Block 54 performs orientation 



obtained by convolving an input achromatic pattern map 
with the horizontal and vertical derivatives of a Gaussian 
and converting the result into polar coordinates. The deriva- 
tives of a Gaussian along x and y axes were computed as 

while the derivatives of the achromatic pattern map along x 
and y axes were computed as 

A^/Hffx A y (y>(g/AP)(i,/), 

where * stands for two-dimensional convolution. These 
derivatives were then transformed into their polar represen- 
tation as: 



f) = tan" 1 



[7i n i 



processing to extract the distribution of pattern contours 

along different spatial directions. Blocks 52 and 54 may be 45 Xexture phenomenon is created through the perception of 

viewed as elements of block 26 of FIG. 1. Block 56, which in ^ &l « e dgeness" along different directions, over different 

corresponds to block 28 of FIG. 1, computes a scale-spatial scales Hence> to estimate the pi acement and organization of 

estimation of texture edge distribution. Block 58, which may lexturc primitives, information about the edge strength at a 

be viewed as an element of the similarity judging block 32, certaifl point ^ not needed; rather> it ^ only aecessary to 

then performs a texture distance calculation to determine the 50 a) whetQer m edge exists at this point, and b) the 



perceptual similarity between the determined texture edge 
distribution and the corresponding distribution of an image 
from the database 30. 

The achromatic map in block 50 is obtained in the 
following manner: For a given texture, by using the number 55 
of its dominant colors N, a gray level range of 0 to 255 is 
discretized into N levels. Then, dominant colors are mapped 
into gray levels according to the following rule: Level 0 is 
assigned to the dominant color with the highest percentage 



direction of the edge. Therefore, after the transformation 
into the polar representation, the amplitude map is nonlin- 
early processed as: 



med{A{i, J)) * T 
med(A(i, j))<T* 



where med (.) represents the median value calculated over a 



of pixels, the next level is assigned to the second dominant 60 5x5 neighborhood. Nonlinear median operation was intro- 



color, etc., until the level 255 has been assigned to a 
dominant color with the lowest area percentage. In other 
words, the achromatic pattern map models the fact that 
human perception and understanding of form, shape and 
orientation is completely unrelated to color. Furthermore, it 65 
resolves the problem of secondary interactions between the 
luminance and chrominance pathways. As an example, 



duced to suppress false edges in the presence of stronger 
ones, and eliminate weak edges introduced by noise. The 
quantization threshold T is determined as: 

7*^-2^, 

where fx A and cr^ are the mean and variance of the edge 
amplitude, estimated on a set of 300 images. This selection 
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allowed all the major edges to be preserved. After quantizing 
the amplitude map, the discretization of the angle space is 
performed, dividing it into the six bins corresponding to 
directions 0°, 30°, 60°, 90°, 120° and 150°, respectively. For 
each direction an amplitude map A^ij) is built as: 



A c O\/) = 1a0(/, y))e£>, 



Scale 1: WSi = .75WX.75W, Ni = 30, 

Scale 2: WS 2 = AQW x .40H, N 2 = 56, 

Scale 3: WS 3 = x .20W, /V 3 = 80, 

Scale 4: WS4 = .10WX.10W, N 4 = 224, 

where WS, and N,- are window size and number of windows 
for scale i, and W and H are the width and height of the input 
texture. Note that the above approach is scale (zoom) 
invariant. In other words, the same pattern at different scales 
will have similar feature vectors. 

The output of the above-described texture processing 
block 56 is a texture feature vector of length 48: 

/rt/<i 6 VVi 92 ^ . . . * V*/^ 81 ■ • • V 6 1 

where and stand for mean and standard deviation of 
texture edges at scale i along the direction fy. Each feature 
component may be normalized so that it assumes the mean 
value of 0 and standard deviation of 1 over the whole 
database. In that way this feature vector essentially models 
both texture-related dimensions (directionality and 
regularity): the distribution estimates along the different 
directions address the dimension of directionality. At any 
particular scale, the mean value can be understood as an 
estimation of the overall pattern quality, whereas the stan- 
dard deviation estimates the uniformity, regularity and 
repetitiveness at this scale, thus addressing the dimension of 
pattern regularity. 

4.1 Texture Metric 

As previously mentioned, at any particular scale, the 
mean values measure the overall edge pattern and the 
standard deviations measure the uniformity, regularity and 
repetitiveness at this scale. The above-noted experiments 
demonstrated that the perceptual texture similarity between 
two images is a combination of these two factors in the 
following way: if two textures have very different degrees of 
uniformity they are immediately perceived as different. On 
the other hand, if their degrees of uniformity, regularity and 
repetitiveness are close, their overall patterns should be 
further examined to judge similarity. The smooth transition 
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between these two factors can be implemented using an 
exponential function. Thus, the distance between the query 
image A and the target image B, with texture feature vectors 

/,(<4>W ■ . . a 4 /«] and /XBHft/i . . . 

respectively, is defined as: 



10 



d, J = w M (f, 6j)M t j + w D (i t 9j)D t j 



where "a" denotes a logic "and" operator and "v" denotes a 
logic "or" operator. The 6, in this example correspond to 
the six directions identified above. 

To address the textural behavior at different scales, mean 
and variance of edge density distribution is estimated, by 15 
applying overlapping windows of different sizes to the set of 
directional amplitude maps. For a given scale, along a given 
direction, edge density is calculated simply by summing the 
values of the corresponding amplitude map within the 
window, and dividing that value by the total number of 20 
pixels in the window. Four scales were used in the illustra- 
tive embodiment, with the following parameters for the 
sliding window: 



At each scale i and direction By, the distance function d ( - e ' is 
the weighted sum of two terms: the first M,- 6 ' measuring the 
difference in mean edge density and the second D f ^, mea- 
suring the difference in standard deviation, or regularity. The 
weighting factors, WjJiflj) and w D {ifij) t are designed such 
that when the difference in standard deviation is small, the 
first term is more dominant; as it increases, the second term 
becomes dominant, thus matching human perception as 
stated above. 

The parameters a and Do control the behavior of the 
weighting factors, where a controls the sharpness of the 
transition, and Do defines the transition point. These two 
parameters were trained in the illustrative embodiment using 
40 images taken from an interior design database, in the 
following way. First, 10 images were selected as represen- 
tatives of the database. Then, for each representative, 3 
comparison images were chosen as the most similar, close 
and least similar to the representative. For each representa- 
tive image I ; , i=l, . . . , 10, the comparison images C iJ9 
j=l, . . . , 3 are ordered in decreasing similarity. Thus, sets 
{I,} and {C jV } represent the ground truth. For any given set 
of parameters (a, Do), the rankings of the comparison 
images as given by the distance function can be computed. 
Let rank iy (a,Do) represent the ranking of the comparison 
image C lV for representative image I,-. Ideally, one would like 
to achieve 

nmk;/ca>oH Vy|^U0ye[l,3]. 
The deviation from ground truth is computed as 



D{a, Do) = £ d t (a t Do), 



50 



where 

3 

4(a. Do) = Yj M** 7 " Cij)-disi(l h C^^)}. 



55 



The goal of the above-described parameter training is to 
minimize the function D(a,Do). Many standard optimiza- 
tion algorithms can be used to achieve this. For example, 
Powell's algorithm, as described in William H. Press et al. p 
"Numerical Recipes in C," 2nd edition, pp. 412 — 420, 
Cambridge University Press, New York, 1992, was used in 
the illustrative embodiment, and the optimal parameters 
derived were a=10 and Do=0.95. 

5.0 Similarity Measurement 
As previously noted, the similarity measurement compo- 
nent 14 in system 10 of FIG. 1 performs similarity mea- 
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surements based on the rules from the above-described 
grammar G. The system was tested on a number of exem- 
plary databases, including a wide variety of different pattern 
images including photographs, interior design, architectural 
surfaces, historic ornaments and oriental carpets. The appli- 
cation of the four rules, Rules 1 to 4, of the grammar G, is 
described in greater detail below. 

APPLYING RULE 1 (equal pattern): Regardless of color, 
two textures with exactly the same pattern are always judged 
to be similar. Hence, this rule concerns the similarity only in 
the domain of texture features, without actual involvement 
of any color-based information. Therefore, this rule is imple- 
mented by comparing texture features only, using the above- 
described texture metric. The same search mechanism sup- 
ports Rule 3 (similar pattern) as well. According to that rule, 
two patterns that are dominant along the same directions are 
seen as similar, regardless of their color. In the same manner, 
textures with the same placement or repetition of the struc- 
tural element are seen as similar, even if the structural 
element is not exactly the same. Hence, the value of the 
distance function in the texture domain reflects either pattern 
identity or pattern similarity. For example, very small dis- 
tances mean that two patterns are exactly the same (implying 
that the rule of identity was used), whereas somewhat larger 
distances imply that the similarity was judged by the less 
rigorous rules of equal directionality or regularity. 

APPLYING RULE 2 (overall appearance): The actual 
implementation of this rule involves comparison of both 
color and texture features. Therefore, the search is first 
performed in the texture domain, using the above-described 
texture features and metrics. A set of selected patterns is then 
subjected to another search, this time in the color domain, 
using the above-described color features and color metrics. 

APPLYING RULE 3 (similar pattern): The same mecha- 
nism as in Applying Rule 1 is used here. 

APPLYING RULE 4 (dominant color): According to the 
rule of dominant color, two patterns are perceived as similar 
if they posses the same color distributions regardless of 
texture quality, texture content, directionality, placement or 
repetition of a structural element. This also holds for patterns 
that have the same dominant or overall color. Hence, this 
rule concerns only similarity in the color domain, and is 
applied by comparing color features only. 

6.0 Query Types and Other Search Examples 

As explained previously, one of the assumptions about the 
model used in the illustrative embodiment is that chromatic 
and achromatic components are processed through mostly 
separate pathways. Hence by separating color representation 
and color metric from texture representation and texture 
metric, the invention provides a system with a significant 
amount of flexibility in terms of manipulation of image 
features. This is an extremely important issue in many 
practical applications since it allows for different types of 
queries. As input into the system the user may be permitted 
to supply: a) a query and b) patterns to begin the search. The 
rules given above model typical human queries, such as: 
"find the same pattern" (Rule 1), "find all patterns with 
similar overall appearance" (Rule 2), "find similar patterns" 
(Rule 3), and "find all patterns of similar color", "find all 
patterns of a given color", and "find patterns that match a 
given pattern" (Rule 4). Moreover, due to the way the color 
codebook of the invention is designed, the system supports 
additional queries such as: "find darker patterns," "find more 
saturated patterns," "find simple patterns," "find multicol- 
ored patterns," "find contrasting patterns." An input pattern 
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provided by the user can be, e.g., supplied by the user, 
selected from a database, given in the form of a sketch, or 
provided by any other suitable technique. If the user has 
color preferences, they can be specified either from the color 

5 codebook, or from another pattern. 

As an example, consider a query in which the user 
provides an input pattern in the form of a sketch. There are 
certain situations when the user is unable to supply an image 
of the pattern he or she is trying to find. Hence, instead of 

10 requiring the user to browse through the database manually, 
the system may provide tools for sketching the pattern and 
formulating a query based on the obtained bitmap image. In 
that case, without any lowpass prefiltering, only a texture 
feature vector is computed for the bitmap image and used in 

15 the search. Furthermore, this search mechanism may allow 
the user to specify a desired color, by selecting a color 
i-fLfja/jb,-} from the color codebook. Then, the search is 
performed in two iterations. First, a subset of patterns is 
selected based on color similarity. Color similarity between 

20 the color i and target image B, with the color feature vector 
f c (BM(i b ,p fc )Ibq;i,NiJ}, is calculated as 

d(i y B) = min D c (/, i b ), 
25 D c (i, k) = V (Li - U) 2 + (a; " a*) 2 + to -b b ? . 

Next, within the selected set, a search based on texture 
features is performed to select the best match. A similar 

30 search mechanism is applied for combination query, where 
the desired pattern is taken from one input image and the 
desired color from another image, or in a search where the 
desired pattern is specified by an input image and the desired 
color is selected from the color map. 

35 FIG. 4 shows an exemplary communication system appli- 
cation of the pattern retrieval and matching system 10 of 
FIG. 1. The communication system 100 includes a number 
of user terminals 102-i, i=l, 2, . . . N and a number of servers 
104-i, i-1, 2, . . . M. The user terminals 102-i and servers 

40 104-i communicate over a network 106. The user terminals 
102-i may represent, e.g., desktop, portable or palmtop 
computers, workstations, mainframe or microcomputers, 
television set-top boxes, or any other suitable type of com- 
munication terminal, as well as portions or combinations of 

45 such terminals. 

The servers 104-i may be, e.g., computers, workstations, 
mainframe or microcomputers, etc. or various portions or 
combinations thereof. One or more of the servers 104-i may 
be co -located with one or more of the user terminals 102-i, 

50 or geographically remote from all of the user terminals 
102-i, depending on the specific implementation of the 
system 100. The network 106 may be, e.g., a global com- 
munication network such as the Internet, a wide area 
network, a local area network, a cable, telephone, wireless or 

55 satellite network, as well as portions or combinations of 
these and other networks. Each of the user terminals 102-i 
may include a processor 110 and a memory 112, and each of 
the servers 104-i may include a processor 114 and a memory 
116. The processors 110, 114 and memories 112, 116 may be 

60 configured in a well-known manner to execute stored pro- 
gram instructions to carry out various features of the inven- 
tion as previously described. 

In operation, a user at one of the user terminals 102-i 
enters a query regarding a pattern for which the user desires 

65 to find matching information in a database accessible by one 
or more of the servers 104-i. FIG. 5 is a flow diagram 
illustrating an example of this process as carried out in the 
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communication system of FIG. 4. In step 120, the user 
utilizes a web browser or other suitable program running in 
terminal 102-i to log on to a web page associated with a 
source of pattern information and accessible over the net- 
work 106. The web page may be supported by one or more s 
of the servers 104-i. The user in step 122 selects from the 
web page a database or set of databases which the user 
would like to search. If the user does not specify a particular 
database, all of the databases associated with the web page 
may be searched. In step 124, the user supplies a query io 
image on which the search will be based. The query image 
may be an image selected from a catalog accessible through 
the web page, a scanned image supplied by the user, e.g., in 
the form of a sketch or other previously scanned or down- 
loaded image. The user in step 126 defines a query, i.e., 15 
specifies the other parameters of the search, such as the type 
of matching patterns that are of interest, the number of 
matches desired, etc. 

The user then launches the search by, e.g., clicking an 
appropriate button or icon on the web page. The query and 20 
query image are then supplied over the network 106 to an 
appropriate one of the servers 104-i. In this embodiment, it 
is assumed that the system 10 of FIG. 1 is implemented by 
appropriate programming of one or more of the servers 
104-i. The system responds in step 130 by displaying to the 25 
user at terminal 102-i a specified number of the best 
matches. In step 132, the user can continue the process by 
modifying the search, launching another search, e.g., with a 
new query image or set of query parameters, or can exit the 
system. 30 

It should be noted that the particular implementation of 
the communication system 100 will vary depending on the 
specific application. For example, in certain applications, 
such as interior design stores or other facilities, to have the 
user terminals geographically co-located with one or more 35 
of servers. In an Internet-based application, the user termi- 
nals may represent personal computers at the user's homes 
or offices, and the servers may represent, e.g., a server 
cluster at a remote location designed to process a large 
number of user queries received from around the world. 40 
Many other applications are of course possible. 

The invention has been described above in conjunction 
with an illustrative embodiment of a pattern retrieval and 
matching system. However, it should be understood that the 
invention is not limited to use with the particular configu- 45 
rations shown. For example, other embodiments of the 
invention may take into account image content or domain 
specific information in performing image retrieval and 
matching. In addition, the invention can be applied to other 
types of information signals, such as, for example, video 50 
information signals in the form of sequences of video 
frames. Numerous other alternative embodiments within the 
scope of the following claims will be apparent to those 
skilled in the art. 

What is claimed is: 55 

1. Asystem for processing information signals, the system 
comprising: 

a processor operative to compare a first information signal 
to a second information signal in response to a user 
query, wherein the first information signal is selected 60 
by the user and the second information signal is stored 
in a database associated with the system, wherein the 
processor extracts color and texture information from at 
least the first information signal using a predetermined 
vocabulary comprising one or more dimensions, and 65 
generates a distance measure characterizing the rela- 
tionship of the first information signal to the second 
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information signal by applying a grammar comprising 
a set of predetermined rules to the color and texture 
information extracted from the first information signal 
and corresponding color and texture information asso- 
ciated with the second information signal, wherein the 
processor receives the first information signal in the 
form of an input image A submitted in conjunction with 
a query from the user, and wherein the processor is 
further operative to measure dimensions DIM^A) from 
the vocabulary, for i=l, . . . , N, and for each image B 
from an image database, to apply rules R £ from the 
grammar to obtain corresponding distance measures 
dist/A, B), where dist^A, B) is the distance between 
the images A and B according to the rule i. 

2. The system of claim 1 wherein at least one of the first 
and second information signals comprises an image. 

3. The system of claim 1 wherein at least one of the first 
and second information signals comprises a sequence of 
video frames, 

4. The system of claim 1 wherein the vocabulary com- 
prises one or more of the following dimensions: overall 
color, directionality and orientation, regularity and 
placement, color purity, and pattern complexity and 
heaviness, and the system generates for at least one of the 
first and second information signals a set of values associ- 
ated with one or more of the dimensions. 

5. The system of claim 4 wherein the system applies the 
rules to the set of values. 

6. The system of claim 5 wherein the equal pattern file is 
a function of the dimension of directionality and orientation 
and the dimension of regularity and placement. 

7. The system of claim 5 wherein the overall appearance 
rule is a function of the dimension of overall color and the 
dimension of directionality and orientation. 

8. The system of claim 5 wherein the similar pattern rule 
is a function of at least one of the dimension of directionality 
and orientation and the dimension of regularity and place- 
ment. 

9. The system of claim 5 wherein the dominant color rule 
is a function of the dimension of overall color. 

10. The system of claim 5 wherein the general impression 
rule is a function of the dimension of color purity and the 
dimension of pattern complexity. 

11. The system of claim 5 wherein each rule is expressed 
as a logical combination of one or more of the values 
generated for at least a subset of the dimensions. 

12. The system of claim 1 wherein the processor is further 
operative to extract an achromatic pattern map from the first 
information signal using a color distribution generated from 
the first information signal. 

13. The system of claim 1 wherein the color distribution 
is estimated using a set of color codebooks, with each of the 
color codebooks corresponding to a different luminance 
level of the first information signal. 

14. The system of claim 1 wherein the processor is further 
operative to generate a color metric characterizing the 
similarity of the color information associated with the first 
and second information signals. 

15. The system of claim 1 wherein the processor is further 
operative to generate a texture metric characterizing the 
similarity of the texture information associated with the first 
and second information signals. 

16. A system for processing information signals, the 
system comprising: 

a processor operative to compare a first information signal 
to a second information signal in response to a user 
query, wherein the first information signal is selected 
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by the user and the second information signal is stored 
in a database associated with the system, wherein the 
processor extracts color and texture information from at 
least the first information signal using a predetermined 
vocabulary comprising one or more dimensions, and s 
generates a distance measure characterizing the rela- 
tionship of the first information signal to the second 
information signal by applying a grammar comprising 
a set of predetermined rules to the color and texture 
information extracted from the first information signal 10 
and corresponding color and texture information asso- 
ciated with the second information signal, wherein the 
set of predetermined rules comprises one or more of the 
following rules: equal pattern, overall appearance, 
similar pattern, dominant color and general impression, 15 
wherein the processor receives the first information 
signal in the form of an input image A submitted in 
conjunction with a query from the user, and wherein the 
processor is, further operative to measure dimensions 
DIM^A) from the vocabulary, for i=l, . . . , N, and for 20 
each image B from an image database, to apply rules R ( . 
from the grammar to obtain corresponding distance 
measures dist t -(A, B), where dist/A, B) is the distance 
between the images A and B according to the rule i. 

17. A system for processing information signals, the 25 
system comprising: 

a processing device having a processor coupled to a 
memory, the processor being operative to compare a 
first information signal to a second information signal 
in response to a user query, wherein the first informa- 30 
tion signal is selected by the user and the second 
information signal is stored in a database associated 
with the system, wherein the processor extracts color 
and texture information from at least the first informa- 
tion signal using a predetermined vocabulary compris- 35 
ing one or more dimensions, and generates a distance 
measure characterizing the relationship of the first 
information signal to the second information signal by 
applying a grammar comprising a set of predetermined 
rules to the color and texture information extracted 40 
from the first information signal and corresponding 
color and texture information associated with the sec- 
ond information signal, wherein the set of predeter- 
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mined rules comprises one or more of the following 
rules: equal pattern, overall appearance, similar pattern, 
dominant color and general impression, wherein the 
processor receives the first information signal in the 
form of an input image A submitted in conjunction with 
a query from the user, and wherein the processor is 
further operative to measure dimensions DIM^A) from 
the vocabulary, for i=l, . . . , N, and for each image B 
from an image database, to apply rules R, from the 
grammar to obtain corresponding distance measures 
dist ( (A, B), where dist/A, B) is the distance between 
the images A and B according to the rule i. 

18. A system for processing information signals, the 
system comprising: 

a processing device having a processor coupled to a 
memory, the processor being operative to compare a 
first information signal to a second information signal 
in response to a user query, wherein the first informa- 
tion signal is selected by the user and the second 
information signal is stored in a database associated 
with the system, wherein the processor extracts color 
and texture information from at least the first informa- 
tion signal using a predetermined vocabulary compris- 
ing one or more dimensions, and generates a distance 
measure characterizing the relationship of the first 
information signal to the second information signal by 
applying a grammar comprising a set of predetermined 
rules to the color and texture information extracted 
from the first information signal and corresponding 
color and texture information associated with the sec- 
ond information signal, wherein the processor receives 
the first information signal in the form of an input 
image Asubmitted in conjunction with a query from the 
user, and wherein the processor is further operative to 
measure dimensions DIM t -(A) from the vocabulary, for 
i=l, . . . , N, and for each image B from an image 
database, to apply rules R ( from the grammar to obtain 
corresponding distance measures dist ( {A, B), where 
dist,{A, B) is the distance between the images A and B 
according to the rule i. 

# * * * * 
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